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(57) Abstract: A system (10) far providing a listener 
with an augmented audio reality m a^ge^a^ncal 
environment includes means (11) for ^detaTmnj^^the; 
geograpmcay»g GP^yasweU as 

the 6y^^^jS^SSnWS^^ei An audio track 
creadon-syst^ audio J tracks having a 

predetermined spatialization component with respect to 
the geographical environment. An audio track rendering 
system (12) renders the audio tracks via sp ea k ers^ 
^eg-^headphpnes) uang^KiK 
on of the-hstener so as to peserve^ 
the spatialization components of the audio tracks. A user 
of the system can thereby experience an augmented audio 
reality in which audio tracks pertinent to the geographical 
environment are rendered with fixed spatial coincidence 
with the environment. 
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Spatialized Audio System for use in a Geographical Environment 

Field of the invention 

The present invention relates to the field of immersive audio environments and, in 
particular discloses an immersive environment utilising adaptive tracking capabilities. 

5 Background of the invention 

Humans and other animals have evolved to take in and process audio information in 
their environment so as to derive information from that environment Hence, our ears have 
evolved to an extremely complex level to enable us to track accurately the position of an audio 
source around us. 

10 Further, the provision of audio information is also a highly efficient form of information 

provision to humans. This is especially the case in the tourism industry where the provision of 
audio dialogue describing scenery is quite common. 

Summary of the invention 

It is an object of the present invention to provide a novel audio immersive experience. 

15 In accordance with a first aspect of the present invention, there is provided a system for 

providing a listener with an augmented audio reality in a geographical environment the system 
comprising: a position locating system for locating a current position of a listener in the 
geographical environment; an audio trade creation system for creating audio tracks having a 
predetermined spatialization component in the geographical environment; an audio track 

20 rendering system adapted to render an audio signal having spatialization components to a series 
of speakers surrounding a listener such that the listener experiences an apparent preservation of 
the spatialization components in the listening experience; an audio track playback system 
interconnected to the position locating system and the audio track creation system and adapted 
to forward predetermined audio tracks to the audio rendering system depending on a users 

25 location in the audio environment such that the series of speakers locate the predetermined 
audio tracks in the environment so as to provide for an augmented audio reality. 

In one embodiment the system can simultaneously provide an augmented audio reality to 
multiple listeners located in the geographical environment in a distributed or centralised 
processing manner or a combination of both. 
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The position locating system preferably can include locating a current orientation of a 
listener's head and the rendering system can utilize the current orientation in rendering the 
spatialization components. 

The system has many applicable uses, for example, tourism, outdoor sight seeing, 
5 museum tours, a mobility aid for the blind and in industrial applications, artistic performances, 
Indoor Exhibition Spaces, Outdoor Exhibition spaces, Tours, Exhibitions, City ToKrs, both 
guided and self-guided, Botanical Gardens, Zoos, Aquariums, Entertainment, Themeparks, 
Interactive theme environments, VR Games, Construction, auditory display of data such as 
plans or existing structures below ground, Architectural on-site walk throughs. 

10 The position locating system preferably can include at least one of a compass, a global 

positioning system, a radio frequency positioning system or an electromagnetic wave 
positioning system. 

In accordance with a further aspect of the present invention, there is provided a system 
for providing an immersive audio environment around a listener, die system comprising: an 
IS audio spatialization system for spatializing the audio of a spatalized audio feed around a 
listener; an audio customization unit for customizing audio content for the listener thereby 
creating the spatialized audio feed; a computer network, attached to the audio customization 
unit for downloading the audio content 

In one embodiment, the user feedback unit can be interconnected to the audio 
20 customization unit, for monitoring user's feedback in response to the spatialized audio feed. The 
computer network preferably can include audio content indexed by geographical location and 
the audio customization unit preferably can include a text to audio rendering unit for rendering 
the text into audio. 

The feedback unit preferably can include a microphone for monitoring the user's 
25 environment with the microphone preferably providing spatialization characteristics of the 
audio in the user's environment The audio customization unit preferably can include: at least 
one personality control unit, customizing the audio content with a personality having 
predetermined characteristics. 

The audio customization unit can be adapted to send a series of information requests 
30 containing geographical indicators to the network, and receive therefrom a series of responses 
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containing geographical indicators for rendering to the user. The audio customization unit of a 
first user can be adapted to interact with the audio customization units of other users so as to 
exchange information. The exchange of information can be dependant on the particular user 
with whom an exchange can be made. 

The computer network preferably can include a series of portals answering requests for 
information by the audio customization units. The audio portals can include personality 
customized information utilised in answering requests for information. 

Brief description of the drawings 

Preferred embodiments of the present invention will now be described by way of 
example only with reference to the accompanying drawings in which: 

Fig. 1 illustrates schematically the locating of audio objects in a geographical space; 

Fig. 2 illustrates schematically one form of the preferred embodiment 

Fig. 3 illustrates a second embodiment of the present invention; 

Fig. 4 illustrates one form of the VAPA of Fig. 3; 

Fig. 5 illustrates schematically the process of mapping geographic URLs to spatial 
locations for use in an audio environment 

Kg. 6 illustrates an alternative embodiment of the present invention; 

Fig, 7 and 8 illustrate further alternative embodiments of the present invention. 

Description of the preferred and other embodiments 

In the preferred embodiment, there is provided an immersive audio system which 
includes positional tracking information to allow for audio information to be personalised to 
each listener in the environment so they may be provided with an augmented reality. 

Fig. 1 provides an illustration of the operation of the preferred embodiment and includes 
a user or listener 1 in an environment The listener is equipped with headphones 2, which, 
depending on the implementation details of the embodiment, can include a set of standard 
headphones and an associated audio processing unit, or, for example, a modified form of 
headphones suitably modified to include the significant DSP processing power required to 
implement the rendering process required in the preferred embodiment 
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The augmented environment includes a series of objects of interest each of which has a 
spatial location and an associated audio track. For example, in a tourism type application, die 
objects of interest may be statues or places of interest in the listener's environment In a gallery 
type environment the objects of interest might be paintings or sculptures etc. To the listener, the 
5 object appears to 'talk* to the listener 1. As will become more apparent hereinafter, the 
preferred embodiment includes an associated audio processing which renders the audio so that it 
appears to be coining from the spatial position of the object 4. 

Turning now to Fig. 2 there is illustrated one form of implementation of an embodiment 
10. The preferred embodiment includes a position detection and orientation system 11 which 

10 locates the listener within a predetermined reference frame. The system 11 can take many 
different forms. For example, it can comprise a global positioning system locater to determine a 
current spatial location of a listener and an accelerometer device to determine a current 
orientation. The accelerometer can take the form of a Microelectromechanical system. 
Depending on the listeners environment, (for example, where the listener is located in a 

15 streetscape), in order to more accurately determine a likely current orientation of a listener, a 
velocity component of the listener can be determined from multiple measurements made over a 
period of time and, if the listener is moving at a walking pace then a weighting can be between a 
velocity vector of orientation and the accelerometer measurement Further, as it is likely that a 
person is looking where they are going, the direction of travel can be used to modify the initial 

20 directional vector of the accelerometer. If however, the accelerometer is of high enough 
accuracy, such modifications may not be required, hi an alternative arrangement, the earths 
magnetic field could be utilised to determine a current orientation. 

The position detection and orientation system outputs a current position and location to a 
rendering engine 12 and a track player determination unit 13. 

25 A geographical marker data base 14 is also provided which includes a series of audio 

tracks 15 - 17 with each audio track having associated location information signifying the 
location in the augmented environment in which the audio track should occur and from how far 
away it should be heard. The track player determination unit 13 utilises the current position 
information from the system 11 to determine suitable audio tracks to play around the current 

30 position of the listener 15. The output audio tracks are then output with associated location 
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information to die rendering engine 12. The location information can comprise the relative 
location of the audio source relative to the listener IS. 

The rendering system 12 renders each audio track given a current orientation of a 
listener so that it appears to come from the designated position. 

5 The rendering system can take many forms. For example, United States Standard 

Application No 08/893848 which claims priority from Australian Provisional Application No. 
PO0996, both the contents of which are specifically incorporated by cross reference, discloses a 
system for rendering a B-formatted sound source in a head tracked environment at a particular 
location relative to a listener. Hence, if the audio tracks are stored in a B-format then such a 

10 system, suitably adapted, can be used to render the audio trades. One example of where such a 
system is suitable is where the B-format part of the rendering to be done centrally, and the 
headtracking part (which is applied to the B-format signal to generate headphone signal) is done 
locally. B-field calculation can be expensive and may be done centrally. However, central 
computation incurs communication delays, and this may have the effect of introducing latency 

15 in position. The headtracking can be done locally because this is very sensitive to latency. 

Alternatively, Patent Cooperation Treaty Patent PCT/AU99/00242 discloses a system for 
Headtracked Processing for headtracked playback of audio and, in particular, in the presence of 
head movements. Such a system could be used as the rendering engine by rendering the audio 
track to a predetermined format (e.g. Dolby 5.1 channel surround) so as to have a predetermined 

20 location relative to a listener, and, in turn, utilising the system described in the PCT application 
to then provide for the localisation of an audio signal in die presence of head movements. 

In the further alternative, Patent Cooperation Treaty Patent PCT/AU99/00002 discloses 
a system for rendering audio such as Dolby 5.1 channel surround to a listener over headphones 
with suitable computational modifications. By locating a sound around a listener utilising 
25 panning of the sound source between virtual speakers and subsequently rendering the speakers 
utilising the aforementioned disclosure, it is again possible to spatialise a sound source around a 
listener. 

Obviously, other known techniques for spatialising sound over headphones could be 
utilised. 
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Ideally, the overall system is implemented in the form of a highly integrated Application 
Specific Integrated Circuit (ASIC) and associated memory so as to provide for an extremely 
compact implementation form. The resulting system allows the wearer to wander at will in 
space and experience a three dimensional acoustic simulation that is overlaid on the real 
5 physical space. The sounds heard can he from multiple sources that respond in volume and 
position as the person moves as if they were real and attached to the real world objects. The 
system can also include sonic objects that are not connected and have non physical range 
rolloff. 

The system has many applications such as artistic performances. Indoor Exhibition 
10 Spaces, Outdoor Exhibition spaces, Tours, Exhibitions, City Tours, both guided and self- 
guided, Botanical Gardens, Zoos, Aquariums, Entertainment, Themeparks, Interactive theme 
environments, VR Games, Construction, auditory display of data such as plans, existing 
structures below ground, Architectural on-site walk throughs with interactive auditory display. 
"And over here there will be a large pink waterfall, tastefully decorated..." etc. 

15 The system utilises the following elements: Listener position and orientation detection, 

Determination of time at location, and time since start, Selection, sequencing and streaming of 
relevant sound sources based on the listener position and time at position or time since start 
with respect to the sound source nominal location and time sequence, Rendering of the streamed 
sound sources to headphones, based on their range and orientation to the listener, Sound storage 

20 and recall, and processing hardware and obviously many variations in these technologies are 
possible. 

Further, many different formats of implementation are possible in multi-listener 
environments. For example, in a centralised implementation all the listener positions can be 
acquired, sound processed and rendered centrally for each listener position then transmitted on a 
25 separate channel to each listener. In a distributed implementation a mobile processing station 
determines its position and locally processes and renders pre-recorded sound to the listener. 

An example utilisation, attempting to provide a sense of its use is set out in the 
following example fictionalised use: 

/ am standing in the rue de Rtvott immediately south cf the Marais Quartier in Paris. I 
30 am still aware of the busy street sound of the rue Rivoli behind me but now I hear a voice 
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beckoning me from the entrance of a small side street - / turn to look but no-one is present - 
strangely the voice persists and as I walk towards the side street the voice dissolves into 
laughter and melts into the sound of running steps which disappear up the narrow street ahead 
of me. To my right a street door slams, some footsteps and I am greeted gruffly, the footsteps 
5 brush past and recede behind me - ahead I hear some music, children's voices and a horse's 
hooves walking across the pavi, I proceed I arrive at the entrance to a small square, the music 
has grown much louder - a whistle to my left, apparently coming from a small Judas gate in the 
portal of the square, again a whistle -as I approach a voice begins to recount a story, at first 
in French, but then it is overlaid by a second voice speaking rather archaic English. 

10 I am told to look up at the small statue that sits in a niche above the portal -lam quite 

dumbfounded - how can my simple headset know I am standing here? Anyway the voice starts 
into a complicated history concerning the statue which represents a poet - but I decide to move 
on. As I walk into the square the voice fades behind me and I enter an atmosphere of wheeled 
barrows being trundled over the cobbled surface and over to my left a child singing a rhyme. 

. 15 (Now if I decided to stay motionless in this square the obvious options for the system 

would be that (a) the barrows repeat their trajectory and the child reiterates the rhyme ad 
nauseam (b) the system would recognise my continued presence and pick up another sequence). 
It is getting late, so T decide to head back to the exhibition centre -as I exit, passing via the 
square's portal once more I encounter a soothsayer laying out the cards of a Tarot reading - / 

20 hear the flick and fall of each card as it is placed on the table and then the slow but intense 
voice of the reader, describing the scene. Eventually when the sequence has been laid out the 
Tarot reading begins in earnest - taking me on a journey through an imaginary landscape, but 
it seems that as each of the places and characters are described I can hear their distant sounds, 
ghosting in the background. (So I have re-entered a mosaic coordinate and the system has 

25 recognised that we have been here before - and has automatically loaded a fresh sound 
sequence for me). 

■ As 1 approach the rue Rivoli bells begin to peed all over the city, it must be the approach 
of Evensong - on the pavement I slowly turn around, locating seven different sets of church 
bells, some proximate and some distant At precisely 18.00 the bells fade and the evening 
30 traffic noise invades my headset - 1 press end programme and enter into the chaos of rush-hour. 
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It can therefore be seen that the system can overlay a virtual sound environment onto 
real world objects so as to use the system to inform or entertain a user. This allows for use in 
many fields such as tourism, outdoor sight seeing, museum tours, a mobility aid for the blind 
and in industrial applications. 

5 The ability to spatialize audio around a listener provides for the ability for more complex 

and useful arrangements to be created. In particular, various customizations of the arrangement 
of Kg. 2 are possible. For example, as illustrated in Fig. 3, there is illustrated schematically an 
alternative embodiment which includes the introduction of the concept of the utilisation of a 
virtual audio personal assistant (VAPA) 21 which provides a degree of customisation and 

10 localisation of information relating to the world view of a user 22. The user 22, utilizes the 
head tracked and audio spatialized system as before with audio being rendered by rendering 
system 23. Similarly, the audio system can include sound recording capabilities. Preferably, 
the sound recording capabilities are provided by B-format microphones which record 
spatialization characteristics of the audio or the like and the audio and associated tracking 

IS information is recorded 24 with portions stored for later analysis 25 before being passed 26 to 
the VAPA 21. The VAPA is interconnected to various networks such as the Internet 28, 
various service providers 29 and other content providers 30. The VAPA provides a customised 
view of the world customised for the listener 22. 

T urnin g now to Kg. 4 there is illustrated 1 schematically one form of implementation of 
20 the VAPA 21. Many other forms of implementation will be available to the person skilled in 
the art of programming and artificial intelligence techniques. The elements of Fig. 4 represent 
the core portions of one software design of the preferred embodiment which can contain the 
following components: - 

A speech and/or symbol recognition unit 35 which takes as an input the recorded audio 
25 stream from the user's environment and applies speech recognition techniques to determine the 
content of the speech around a listener, including decoding a user's speech. This unit can also 
determine audio gestures such as tongue clicks or the like of a listener so as to provide for 
interaction based on these audio gestures. Also, the audio can be itself recorded by audio 
recording unit 36. 

30 An audio clip creation unit 38 is responsible for the creation of audio content having a 

relative spatial location relative to a listener. The audio dips are forwarded to rendering system 



WO 01/55833 



9 



PCT/AU01/00079 



23 (Fig. 3) for rendering around a listener. The audio clip creation unit can include text to audio 
rendering and ideally renders the audio with associated spatialization information for location 
around a listener. 

A tracking unit 39 accurately keeps and records the location and orientation of a 
5 listener's head. 

A master control unit 40 is responsible for the overall control of the VAPA 21. 

A personality engine 43 is responsible for providing various VAPA personalities to the 
user and interacts with a personality database 43 which stores customisation information of a 
user 9 s interests and activities etc. 

10 The system 21 can include various artificial intelligence inferencing engines and 

learning capabilities 44 which obviously ate fully extendable and themselves evolvable over 
time with advances in AI type techniques. 

A contract negotiation engine 45 is provided for the negotiating of transfer of 
information and carrying out of transactions across a network interface 46 which interfaces with 
15 external networks 47 in accordance with any regulatory framework that may be in place. 

A data cache 48 is provided for storing frequently used data. 

A network interface 46 for connecting with external Internet type networks. 

The units of the VAPA can be all interconnected 49 as necessary and can be 
implemented on a distributed computer architecture such as a clustered computer system so as 

20 to provide for significant computation resources. It will be obvious to those skilled in the art 
that other forms of the implementation of the VAPA are possible. Preferably, the VAPA 
operates in an environment which is rich in audio information. For example, one such 
environment can comprise an extension of the commonly utilised form of Universal Resource 
Locaters (URLs) which are commonly utilised on the World Wide Web as a data interfacing 

25 and exchange system. Ideally, in the preferred embodiment a URL system is provided which 
maps geographic locations of particularly unique URLs. An example is shown in Fig. 5 
wherein an example is illustrated in which certain geographical locations such as cafes or the 
like have an associated geographic URL 50,51. A listener 52 utilizing the system is able to 
preferably access the URLs utilizing a standard interfacing technique such as producing a 
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tj^^^njg^&^^jg j^ Upon clicking a tongue, the current 

orientation of the listener's head is taken into account to access the URL eg 50 associated with 
the location 52. Upon the user requesting access to the URL, the VAPA accesses the associated 
URL over a computer network so as to download information associated to the URL 

5 In this manner, URLs are mapped to physical objects and individuals which are then 

capable of 'broadcasting' personal information, requests, laying trajectories et al. so as to 
provide a seamless integration of the experience of the sensory and the informatic realms. 
Dynamic objects such as people, planes, dogs and motor vehicles can be tracked by a variety of 
sensing systems. The URLs are then accessed so as to stream audio data via the relevant 

10 network server. Preferably allowing the users to both send and receive information. . 

It will be evident that objects are then able to provide a standard interface mechanism to 
indicate themselves, enter into negotiations and make transactions with the VAPA. A user is 
therefore able to select/query an object of interest (eye tracking, tongue click or other interface) 
causing the object to display its data - if this is a commercial object a transactional sequence 
15 might be negotiated, either by the user personally or by the VAPA on the users behalf. Mobile 
objects and people can be dynamically tracked and position located. In the case of an 
individual 'broadcasting 1 information, the VAPA can selectively screen the data and pass on 
items of interest to the user who might wish to enter into a direct conversation - alternatively the 
two individuals might electronically exchange data, and/or arrange an appointment etc. 

20 Rnther refinements are possible. For example, ideally the VAPA can take on multiple 

♦ persona' s, representing various levels of intervention/management^nfbrmation provision - ie 
from the informal and friendly to the strictly efficient. The VAPA can act also as a personal 
assistant, maintaining a diary, recognised the day's agenda, requesting advice on how to handle 
the user, and transacting with external bodies such as taxi companies or the like to order 

25 services giving the users URL (and destination and credit card number) which will allow the 
service provider to locate the user in physical space. 

Depending on the environment and interfaces provided, the user may use non-verbal 
action (wink) or say tongue click to indicate object of inquiry and launch the various AI engines 
to search for combinations/links between data associated with physical sites, temporal data 
30 (news/stock exchange) and data stored as knowledge. The VAPA can then make an initial 
screening of the data and present the most pertinent elements. 
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Ideally, the keeping of personal information allows the system to remember what a user 
does each day and responds to the user's behaviour. In this way, the user can establish a 
complex set of profiles over time - for example work related interests, a network of contacts, 
frequently visited physical locations (restaurants, home, work) with which regular sets of 
5 activities are associated. Or new locations which are to be visited for which data is selected 
according to the user's anticipated requirements. Ideally, the system is able to records what a 
user hears for later retrieval and analysis. 

Further, the VAPA can preferably modulate the volume of various sound sources 
depending on the orientation of a listener. The VAPA can also be capable of tagging audio input 
10 (or data input) to a physical location for later user. 

An example utilization of the system is given in the following dialogue: 

/ haven't been in this city for a long time, it is evening and I have a few hours to kill 
before an appointment. It was a long flight, but crfter a couple cf hours sleep and a shower I 
am ready to re-join the human race - to login again. 

15 After dressing I carefully insert the studs of my VAPA (Virtual Audio Personal Assistant) 

through my earlobes and gently insert the miniature speaker conduits into my ear canals, a 
clear but voice responds to the almost inaudible double click of my tongue 

"Oh hello Nigel, we have arrived in Helsinki and it is 2123, 1 presume you have slept 

well?" 

20 "Uhhuh" 

"I have double checked your room bookings and all your appointments have confirmed, 
what are your requests for this evening?" 

"Well this is Helsinki - how about you find me a good bar with Russian food, then 
arrange Tapio to meet me at the Meteori Bookstore at 23.00 - guide me when I leave the 
25 building". 

"Do you want a cab?" 

No thanks - and just be pretty quiet this evening ok - only chat if it is important and 
would you turn off that local tourist background - it drives me nutst " 
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/ leave the hotel and adjust my astrakhan hat - ouch it's cold here, the VAPA assumes 
the laid-back 'Robert* persona, his voice over to my right beckons me, "Let's go this way - look 
ahead and you will see a large Theatre Building, take the first left after the main entrance and 
walk for about 150 metres". Standing at the Kerb I stare at the grey bulk of the National 
5 Theatre, I blink as a snow flake brushes my face and immediately the Theatre begins to 
announce its programme, with some surround sound musical extracts thrown in to entice met 

"Robert would you turn this thing off- look, I know I haven't been here for a long time 
but I want a quiet evening - so go easy on the hot-spots ok, maybe increase the threshold of my 
triggers to double-blink and triple tongue-click for a while/" 

10 / walk through the light snow flurries in silence, Robert has suppressed all the normal 

weather data, stock exchange, voicemail etc and is doing a good job of filtering the commercial 
and historical information which to be sure every structure and surface in Ms city is capable of 
broadcasting. 

Again his voice, some 15 meters ahead of me indicates that this is the bar. It sports a 
15 large red star with a Russian script, I rapidly blink my right eye, the bar swirls with sound and 
a bass Slavic voice welcomes me in heavily accented English - the bar is called "Zetor" named 
after a famous Russian tractor and.... with a single click of the tongue I terminate my host 
midway through his recital of today's menu. Entering I take a place at the bar on a well 
sprung iron tractor seat and order a Vodka from the bartender, who as is normal winks twice at 
20 me and smiles. 

He returns with the shot glass and two slices of dill pickle and in an apologetic tone 
asks if I want to settle in cash as my 'signature' is down. Realising that I am without cards or 
hard currency I quietly ask Robert to restore my URL signature to visibility and I nod 
congenially at the barman, who again winks twice at me (though without smiling this time). 

25 Credit card details are logged and eventually the barman returns to strike up a casual 

conversation 

"Well it has been sometime since you were here Nigel - has the place changed much?" 

"Not at all I reply " regretting that the Barman now knew who I was, what I did and if he 
cared to, could recall every drink I had ever ordered here - perhaps they even had some audio 
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archives of these conversations/ "Maybe you should re-do your virtual doorman out there - 
no-one speaks with those Uncle Vanya accents any more -oris it just a Finnish joke?" 

In the background the music of 'Rinne-Radio* fills the room (well in a virtual manner) 
the bar has recognised my favourite Finnish band and has simulated the ambience on my behalf 
5 -but the big guy over in the corner tapping his feet at an incredible rate must be on some 
strange Nordic-Technol 

Robert discreetly pipes up again - unsure about my interest in the feral girl wearing a 
leather jacket down at the other end of the bar. Obviously she had 'blinked' me whilst Robert 
fixed up the credit card with the barman and decided that we has very similar interests, at least 
10 she had offered to by me a drinkt 

"She looks good on paper" offers Robert who closes with the somewhat rhetorical 
question "How is she in physical reality?" 

I decide to take up the offer - but ask the VAPA to close down my signature for the 
while, after all the lady has already downloaded from my URJL As I walk over slowly I fix my 
15 gaze on the leather jacket and triple click my tongue, her general introduction begins to play 
out, set into a room ambience of chamber music (looks can be deceiving/) I perform a rapid eye 
movement to the left to access her credentials, name, nationality, profession, age and so on. 

I was in the process cf clicking off when I must have accidentally queried an object for 
instantly a man's rather elegant wool jacket reeled off a sophisticated sales routine and let me 
20 know that tomorrow the Stockmann department store had a 35% sale on men's wear. My 
signature was down so Stockmann's wouldn't be getting in touch with Robert to arrange a 
fitting as it lacked the necessary information concerning my preferred cut, fabric and colour - 
anyway when I travel I still like to do old fashioned window shopping! And now for some old 
fashioned conversation :- 

25 We exchange greetings and I thank Terhifor the drink. "Tell me more about the book 

you are writing I ask (although Robert has already given me the title) as you know this is my 
field of specialisation" 

"Let me remember this conversation" she begins (indicating that her VAPA is audio 
archiving our meeting, logging its location and time -in addition it will be exchanging the data 
30 on our respective URL's and possibly searching for convenient future appointment times) " the 
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book concerns the history of audio recording and its effects on concepts of human 
memory. " 

The conversation is very cammed - the evening passes quickly and a reasonable amount 
of Vodka is imbibed. Eventually Robert takes on a slightly hectoring tone telling me that he 
5 has ordered a taxi to meet me as soon as I leave the building (which I am advised to do ASAP 
as I am running late). 

Terhi and I arrange to meet the following week at a concert - her VAPA will liaise with 
mine about the exact arrangements - we take our leave. The barman says goodnight and as I 
pace down the snow covered street I hear a taxi tone ptaying some way behind me - 1 decide to 
10 keep walking ahead, simply to keep warm, the driver knows where lam anyhow. 

Tapio's voice appears and tells me that I will be there in about three minutes so what 
kind of coffee would I like, coffee with Russian Vodka, or Coffee with Finnish Vodka?...... 

The above scenario is obviously indicative only of the type of functionality that can be 
provided. 

15 It will be evident to the person skilled in the ait that other forms of implementation of 

embodiments of the invention are possible. One further alterative embodiment will now be 
discussed initially with reference to Fig. 6 which illustrates a schematic of the hardware 
portions of an alternative form of the embodiment In this embodiment, a user 60 is equipped 
with a set of headphones 61 which include a position and orientation tracker 62. The position 

20 and orientation tracker can include magnetic compass or the like, in addition to GPS receiver 
technology. The headphones also include a microphone 63 and are attached to a processing unit 
for rendering audio spatially 64. The processing unit is in turn interconnected to a 
communications unit 65 which can comprise a mobile phone device or the like. The 
communications device 65 is in permanent connection with a base station 67 so as to transmit 

25 position information and microphone audio Co the base station 67 and receive structured audio 
and text data or the like from the base station 67. The link can be driven by a communications 
interface 68 which acts like a modem transmission system. The execution portions 69 are 
provided in a base station. The base station includes a number of processing units 70 which 
provide processing capabilities for a number of different virtual audio personalities. The 

30 processing unit 70 interacts with a state context cache 71 and operates under the control of a 
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master control program 72. The processing unit 70 are in turn interconnected with an Internet 
interface 72 which interacts with the Internet 73 so as to download information for forwarding 
to the user 60 in an audio format as previously described 

Turning now to Fig. 7, there is illustrated a further schematic diagram of an alternative 
5 embodiment The alternative embodiment includes a number of VAPAs 80 which each 
implement a different audio personality for a user. The VAPAs are interconnected to a network 
81 which can comprise the Internet for accessing and downloading information on demand. 
Input to the VAPAs include position and orientation data associated with the user. The VAPAs 
output messages to a message sorting unit 81 which determines which messages shall be 
10 forwarded to the user depending upon a set of user controls 82 and other state data as previously 
set by the user. Messages can be in a text or audio format. A subset of the messages are output 
from the message sorting unit 8 1 with text messages being output to a text to speech processor 
84. The audio data includes spatalization information and is output to a binauralization unit 85 
which spatalizes the audio utilizing the head tracking information 86 for output to headphone 
15 devices 87. 

One form of VAPA unit 80 is illustrated in more detail in Hg. 8. Each VAPA can 
implement a separate personality and is operated by a personality engine 91 which interacts with 
a behaviour and preferences database 92. The database 92 can include details on behavioural 
characteristics of the VAPA including such factors as the voice characteristics of the VAPA, 
20 and its priority relative to the other VAPAs. Further, the preferences can include the kinds of 
things that the user is interested in, whether the VAPAs of other users near a current user should 
be told of the VAPAs presence, whether shops and social services etc should be told of the 
users presence in the vicinity, what kind of portals the VAPA will talk to. 

The preferred embodiments also allow for a new type of portal (similar to those 
25 provided by the likes of Yahoo etc). The portals can contain information of say a series of 
shops selling a particular product in a predetermined area. The portals can include an accredited 
level of advertising and sharing of personal data and can further include specialist portals such 
as a specialist tour guides etc. The VAPA, as illustrated in Fig. 8, sends a series of messages to 
the relevant servers and receives a series of responses to each request The responses are 
30 examined for suitability before being forwarded to die user. An example of message can, for 
example, be "my GPS Co-ordinates are x, y, z and I want to know about men's shoes". The 



WO 01/55833 



16 



PCT/AU01/00079 



response list might include entries of forms such as "GPS coordinate a, b, c includes Bill's Shoe 
Shop which has a special on Italian shoes for sale". In this manner, the VAPAs are able to 
converse with a world-wide-web type structure for providing information on demand and 
allowing the user to experience an augmented audio reality. 

5 In various embodiments, the network can include various push advertising scenarios 

wherein the owner of a shop of the like pays a fee to make an announcement to a user in their 
vicinity of a shop sale or the like. The fee can be divided obviously between the providers of 
the network and the users in accordance with any agreed terns. Further, the user can provide a 
series of layered personal information facilities. In this manner, information can be revealed 

10 from one VAPA to a second VAPA depending upon the relationship between the corresponding 
users VAPAs. In this manner, VAPAs, are able to talk to one another and reveal information 
about their users depending upon the access level of the VAPA requesting information. The 
VAPAs in a sense can act as agent negotiators on behalf of their users, seeking an audio 
approval from their users when required. 

15 Various billing arrangement can be provided depending on the level of service provided. 

Further, listeners may receive a portion of revenues for listening to advertisements in the 
system. Further, specialist tours could be provided with the implemented of the system 
negotiating with famous persons or the lite to conduct an audio tour of their favourite place. 
For example "Elle McPhezson's Tour of Dress Shops in Paddington" could be provided to be 

20 provided. The preferred embodiments obviously have extension to other areas such as military 
control systems or the like. Further, obviously multiple different VAPAs with different 
personalities can be presented to a user in an evolving system. 

It will be understood that the invention disclosed and defined herein extends to all 
alternative combinations of two or more of the individual features mentioned or evident from 
25 the text or drawings. All of these different combinations constitute various alternative aspects of 
the invention. The foregoing describes embodiments of the present invention and modifications, 
obvious to those skilled in the art can be made thereto, without departing from the scope of the 
present invention. 
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Claims 

1. A system for providing a listener with an augmented audio reality in a 
geographical environment said system comprising: 

a position locating system for locating a current position of a listener in said 
geographical environment; 

an audio track creation system for creating audio tracks having a predetermined 
spatialization component in said geographical environment; 

an audio track rendering system adapted to render an audio signal to a series of 
speakers surrounding an apparent listener such that said listener experiences an apparent 
preservation of said spatialization components in said listening experience; 

an audio track playback system interconnected to said position locating system 
and said audio track creation system and adapted to forward predetermined audio tracks to said 
audio rendering system depending on a users location in said audio environment 

such that said series of speakers locate said predetermined audio tracks in said 
environment so as to provide for an augmented audio reality. 

2. A system as claimed in claim 1 wherein said system simultaneously provides an 
augmented audio reality to multiple listeners located in said geographical environment 

3. A system as claimed in any previous claim wherein said speakers comprise a set of 
headphones. 

4. A system as claimed in any previous claim wherein said position locating system 
includes locating a current orientation of a listener's head and said rendering system utilises said 
current orientation in rendering said spatialization components. 

5. A system as claimed in any previous claim wherein said system is used in one of 
tourism, outdoor sight seeing, museum tours, a mobility aid for the blind and in industrial 
applications, artistic performances, Jhdoor Exhibition Spaces, Outdoor Exhibition spaces, 
Tours, Exhibitions, City Tours, both guided and self-guided, Botanical Gardens, Zoos, 
Aquariums, Entertainment, Themeparks, Interactive theme environments, VR Games, 
Construction, auditory display of data such as plans or existing structures below ground, 
Architectural on-site walk throughs. 

SUBSTITUTE SHEET (RULE 26) RO/AU 
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6. A system as claimed in any previous claim wherein said position locating system 
includes at least one of a compass, a global positioning system, a radio frequency positioning 
system or an electromagnetic wave positioning. 

7. A system for providing an immersive audio environment around a listener, said 
system comprising: 

an audio spatialization system for spatializing the audio of a spatalized audio feed 
around a listener, 

an audio customization unit for customizing audio content for said listener thereby 
creating said spatialized audio feed; 

a computer network, attached to said audio customization unit for downloading said 
audio content. 

8. A system as claimed in claim 7 further comprising: 

user feedback unit interconnected to said audio customization unit, for monitoring user's 
feedback in response to said spatialized audio feed. 

9. A system as claimed in claim 7 or 8 wherein said computer network includes 
audio content indexed by geographical location. 

10. A system as claimed in any of claims 7 to 10 wherein said computer network 
includes textual content indexed by geographical location and said audio customization unit 
includes a text to audio rendering unit for rendering said text into audio. 

11. A system as claimed in claim 8 wherein said feedback unit includes a 
microphone for monitoring said user's environment 

12. A system as claimed in claim 1 1 wherein said microphone provides spatialization 
characteristics of the audio in said user's environment 

13. A system as claimed in any previous claim 7 to claim 12 wherein said audio 
customization unit includes: 

at least one personality control unit, customizing said audio content with a 
personality having predetermined characteristics. 
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14. A system as claimed in any previous claim 7 to 13 wherein said audio customization 
unit is adapted to send a series of information requests containing geographical 
indicators to said network, and receive therefrom a series of responses containing 
geographical indicators for rendering to said user. 

15. A system as claimed in any previous claim 7 to 14 wherein said audio customization 
unit of a first user is adapted to interact with the audio customization units of other users 
so as to exchange information. 

16. A system as claimed in claim 15 wherein said exchange of information is dependant on 
the particular user with whom an exchange is made. 

17. A system as claimed in any previous claim 7 to 16 wherein said computer network 
includes a series of portals answering requests for information by said audio 
customization units. 

18. A system as claimed in claim 17 wherein said audio portals include personality 
customized information utilised in answering requests for information. 

19. A system substantially as hereinbefore described with- reference to the accompanying 
drawings. 
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